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Segniliparus rugosus represents one of two species in the genus Segniliparus, the sole genus in 
the family Segniliparaceae. A unique and interesting feature of this family is the presence of 
extremely long carbon-chain length mycolic acids bound in the cell wall. 5. rugosus is also a 
medically important species because it is an opportunistic pathogen associated with mamma- 
lian lung disease. This report represents the second species in the genus to have its genome 
sequenced. The 3,567,567 bp long genome with 3,516 protein-coding and 49 RNA genes is 
part of the NIH Roadmap for Medical Research, Human Microbiome Project. 



Introduction 



Strain CDC 945 T (= ATCC BAA-974 T = CIP 10838 T = 
DSM 45345 = CCUG 50838? = JCM 13579?) is the 
type strain of the species Segniliparus rugosus in 
the Segniliparaceae family [1]. The genus name 
was created to acknowledge the presence of novel 
long carbon-chain fatty acids (mycolic acids) de- 
tected using the Mycobacterium species identifica- 
tion method with high performance liquid chro- 
matography (HPLC) [2]. The name was formed 
from the Latin adjective 'segnis', meaning 'slow' 
and combined with the Greek adjective 'liparos' 
for 'fatty', to indicate the 'one with slow fats'. The 
name relates to the late elution of the apolar, al- 
pha-mycolic acids (fatty acids) during HPLC anal- 
ysis [1]. The specific epithet for the taxon name is 
from the Latin adjective 'rugosus', referring to the 
formation of wrinkled, rough colony morphology 
[1]. The type strain of S. rugosus, CDC 945 T , was 
isolated from a human sputum specimen collected 
in Alabama, USA [1]. S. rugosus has been isolated 
from multiple patients with cystic fibrosis in the 
U.S. and Australia and appears to be a respiratory 
opportunistic pathogen [3,4]. A recent isolation 



from a ~1 year old sea lion showing third-stage 
malnutrition with a 30% loss of body weight, 
moderate bradycardia and severe hypothermia, 
suggests a possible aquatic or marine niche for the 
species [5]. The only other validly named species 
of the genus is Segniliparus rotundus (CDC 1076 T ), 
which is the type strain of this species. S. rotundus. 
CDC 1076 shares 98.9% 16S rRNA sequence iden- 
tity with S. rugosus CDC 945 T , although the DNA- 
DNA hybridization is less than 28% [1]. The com- 
plete genome of S. rotundus was recently reported 
and has 3,157,527 bp with 3,081 protein-coding 
and 52 RNA genes [6]. Here we present a sum- 
mary classification and a set of features for S. ru- 
gosus CDC 945 T , together with the description of 
the high quality draft genomic sequencing and an- 
notation. 



Classification and features 



The cells of CDC 945 T are irregular rods ranging in 
length and width from 0.55-0.90 u.m by 1.9-4.5 
u.m (Table 1 and Figure 1). Colonies are wrinkled, 
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rough and form in less than 7 days on 7H10 and 
7H11 agar at an optimal temperature of 33°C [1]. 
CDC 945 T is aerobic, non-motile, asporogenous, 
and stains bright red with acid alcohol stain [1]. It 
is mesophilic and demonstrates a temperature 
range for growth between 22 and 42 °C [1]. Colo- 
nies grown for < 4 weeks are non-pigmented, 
nonphotochromogenic and do not produce a diag- 
nostic odor [1]. They do not produce aerial myce- 
lium, spores or demonstrate true branching. 
Young colonies are creamy and smear easily when 
disturbed. Cell growth at ~4 weeks on Lowens- 
tein-Jensen (LJ) medium produces a diffusible 
pink color in the agar at the leading edges of ma- 
ture growth. Aged colonies on LJ develop a light 
buff pigment and demonstrate 'greening' from up- 
take of malachite green [1]. CDC 945 T is weakly 
positive for arylsulfatase at 7 days but is strongly 
positive in 14 days. No growth on MacConkey agar 
not containing crystal violet. CDC 945 T grows in 
the presence of 5% sodium chloride at 7 days, in 
lysozyme at 21 days. Positive for iron uptake, ni- 
trate reduction, tellurite reduction and tween 
opacity. Negative for tween hydrolysis [1]. 

Results with the API CORYNE test kit shows CDC 
945 T is positive for (B-glucosidase, and pyrazina- 
midase activities and negative for alkaline phos- 
phatase, (B-galactosidase, (B-glucuronidase, a- 
glucosidase, iV-acetyl-(B-glucosaminidase and pyr- 
rolidonyl arylamidase activity at 33°C [1]. It is sus- 
ceptible to imipenem 4ug/ml, moxifloxacin 0.5 
|ig/ml, and trimethoprim-sulfamethoxazole < 4.8 
ug/ml, intermediate to cefoxitan 64 ug/ml and 
resistant to amikacin >128 ug/ml, clarithromycin 
32 ug/ml, ciprofloxacin 16 ug/ml, ethambutol >16 
ug/ml and tobramycin >64 ug/ml. [1,3]. Strain 
CDC 945 T uses D-glucose, glycerol, maltose, man- 
nitol, D-sorbitol and trehalose as sole carbon 
sources with the production of acid. No growth on 
adonitol, L-arabinose, cellobiose, citrate, dulcitol, 
i-erythriol, galactose, i-myo-inositol, lactose, man- 
nose, melibiose, raffinose, L-rhamnose, salicin or 
sodium citrate [1]. The strain hydrolyzes urea but 
not acetamide adenine, casein, aesculin, hypoxan- 
thine, tyrosine or xanthine [1]. 

Chemotaxonomy 

The cell wall of strain CDC 945 T contains mycolic 
acids and meso-diaminopimelic acid [1]. The my- 
colic acid pattern developed with HPLC is a double 
cluster of peaks emerging at 7.24 min and the last 



peak group is unresolved and elutes slightly before 
the 110 carbon chain length, high molecular weight 
internal standard [1,2]. Thin layer chromatography 
confirms 2 groups of apolar, a- and a'-alpha- 
mycolic acids lacking oxygen function, other than 
the hydroxyl group [1]. The HPLC and TLC results 
indicate that this strain produces a unique homo- 
logous subclass of long, alpha-mycolic acids with 
additional 90 to 110 carbons [1]. The fatty acid 
profile by gas-liquid chromatography is Cio : o 
(8.65%), C 12:0 (1.33%), Ci 4:0 (8.49%), Ci 6:0 
(18.34%), Ci8:i m 9c (8.93%), Ci8 : o io-methyi (tuberculos- 
tearic acid, 21.62%), and C 20 (28.51%) [1]. 

The phylogenetic association of Segniliparus spe- 
cies is shown in Figure 2 in a 16S rRNA based tree. 
The genus forms a distinct lineage relative to the 
other mycolic acid containing Actinobacteria. The 
positioning of Segniliparus in this phylogenetic 
analysis is consistent with its positioning in the 
"All-Species Living Tree Project" LTP release 106, 
August 2011, which is similarly based on 16S 
rRNA. [11]. 

Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its association with respiratory lung dis- 
ease and is part of the NIH Roadmap for Medical 
Research, Human Microbiome Project (HMP) [12]. 
The HMP presents reference genomes in Genome 
Online Database (GOLD) [10], the Human Micro- 
biome Project Data Analysis and Coordination 
Center Project Catalog [13] and the complete high 
quality draft genome sequence is deposited in 
GenBank [14]. The Broad Institute performed the 
sequencing and annotation of this high quality 
draft genome [15]. A summary of the project is 
given in Table 2. 

Growth conditions and DNA isolation 

Strain CDC 945 T was grown statically in Middle- 
brook 7H9 medium at 33°C until late log. DNA was 
isolated from whole cells after a chloro- 
form/methanol wash with a disruption solution of 
guanidine thiocyamate, sarkosyl and mercaptoe- 
thanol as described in Mve-Obiang et al. [16]. The 
purity of DNA was assessed by The Broad Institute 
using the Quant-iT™ dsDNA Assay High Sensitivity 
Kit (Invitrogen, Carlsbad, CA) and according to the 
manufacturer's protocol. 
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Figure 2. Maximum likelihood phylogenetic tree was generated using PHYML v2.2.4 [7] based on 16S rRNA se- 
quences highlighting the position of 5. rugosus CDC 945 T relative to the other type strains of mycolic acid containing 
genera in the suborder Corynebacterineae. GenBank accession numbers are listed after the name. The tree was in- 
ferred from 1,468 bp positions aligned using Clustal W [8] in MEGA v4 [9]. Numbers at the branch nodes are sup- 
port values from 1,000 bootstrap replicates if equal to or greater than 70%. The scale bar indicates substitutions per 
site. The tree was rooted with Streptomyces coelicolor. Lineages with type strain genome sequencing projects regis- 
tered in GOLD [10] are shown in blue, published genomes in bold. 
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Tabid . Classification and general features of 5. rugosus CDC 945 T according to the MIGS recommendations [25]. 



MIGS ID Property 



Term 



Evidence code 



MIGS-22 



Current classification 



Gram stain 

Cell shape 

Motility 

Sporulation 

Temperature range 

Optimum temperature 

Salinity 

Oxygen requirement 
Carbon source 



Domain Bacteria 

Phylum Actinobacteria 

Class Actinobacteria 

Subclass Actinobacteridae 

Order Actinomycetales 

Suborder Corynebacterineae 

Family Segniliparaceae 

Genus Segniliparus 

Species Segniliparus rugosus 

Type strain CDC 945 

not reported 

rods, irregular 

nonmotile 

non-sporulating 

mesophile, 22-42 °C 

33°C 

unknown 

aerobic 

D-glucose, glycerol, maltose, mannitol, D-sorbitol and 
trehalose 



TAS [26] 
TAS [27] 
TAS [28] 
TAS [28] 
TAS [28,29] 
TAS [28,29] 
TAS [1,29] 
TAS [1] 
TAS [1] 
TAS [1] 

TAS [1 ] 
TAS [1] 
TAS [1 ] 
TAS [1] 
TAS [1] 

TAS [1 ] 
TAS [1] 



MIGS-6 

MIGS-15 

MIGS-14 



MIGS-4 

MIGS-4.1 

MIGS-4.2 

MIGS-4.3 

MIGS-4.4 

MIGS-5 



Energy source 
Habitat 

Biotic relationship 
Pathogenicity 
Biosafety level 
Isolation 

Geographic location 

Latitude 

Longitude 

Depth 

Altitude 

Sample collection time 



chemoorganotroph 
environmental water suggested 
likely free-living 
opportunistic pathogen 
2 

sputum, human 
Alabama, USA 
not reported 
not reported 
not reported 
not reported 
1998 



TAS [1] 
TAS [5] 
NAS [5] 
TAS [1,3] 
TAS [1,3,30] 
TAS [1,3] 
TAS [1] 



TAS [1] 



Evidence codes- TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS Non-traceable Au- 
thor Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property 
for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [31]. 
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Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


K A 1 f~* C 11 

MIGS-31 


Finishing quality 


High Quality Draft 


MIGS-28 


Libraries used 


Two 454 pyrosequence libraries, one standard 0.6kb fragment library 
and one 2.5kb jump library 


MIGS-29 


Sequencing platforms 


454 Titanium 


MIGS-31. 2 


Sequencing coverage 


13x 


MIGS-30 


Assemblers 


Newbler Assembler version 2.3 PostRelease-1 1/19/2009 


MIGS-32 


Gene calling method 


Glimmer; Metagene; PFAM; BLAST to non-redundant protein database; 
manual curation 




Genbank ID 


ACZI01 000000 




Genbank Date of Release 


November 10, 2010 




GOLD ID 


Gi05259 




NCBI project ID 


40685 


MIGS-13 


Source material identifier 


ATCC BAA-974 1 




Project relevance 


Human Microbiome Project 



Genome sequencing and assembly 

The genome of Segniliparus rugosus ATCC BAA- 
974 was sequenced using 454 pyrosequence 
fragment and jump libraries [17]. We assembled 
the 454 data, consisting of 135,510 fragment 
reads and 112,271 jump reads, using Newbler 
Assembler version 2.3 PostRelease-11/19/2009. 
The assembly is considered High-Quality Draft 
and consists of 262 contigs arranged in 30 scaf- 
folds with a total size of 3,567,567 bases. The er- 
ror rate of this draft genome sequence is less 
than 1 in 10,000 (accuracy of ~ Q40). Average 
sequence coverage is 13x. Assessment of cover- 
age, GC content, contig BLAST and 16S contig 
classification were consistent with the species 
Segniliparus. 



selects non-overlapping genes that, combined, have 
the highest overall score. In cases where predictions 
overlapped non-coding RNA features (see below), 
the genes were manually inspected and removed 
when necessary. Finally, the gene set was reviewed 
using both the NCBI discrepancy report and the in- 
ternal Broad annotation metrics. Ribosomal RNAs 
(rRNAs) were identified with RNAmmer [21]. The 
tRNA features were identified using tRNAScan [22]. 
Other non-coding features were identified with 
RFAM [23]. The gene product names were assigned 
based on Hmmer equivalogs from TIGRfam and 
Pfam, and blast hits to KEGG and SwissProt protein 
sequence databases. This was done using the nam- 
ing tool "Pidgin" [24]. 



Genome annotation 

Protein-coding genes were predicted using four 
ORF-finding tools: GeneMark [18], Glimmer3 [19], 
Metagene [20], and findBlastOrfs (unpublished). 
This latter tool builds genes by extending whole- 
genome blast alignments, in-frame, to include start 
and stop codons. The final set of non-overlapping 
ORFs was selected from the output of these tools 
using an in-house gene-caller, which uses dynamic 
programming to score candidate gene models based 
on strength of similarity to entries in UniRef90, then 



Genome properties 

This 3,567,567 bp draft genome has high G+C con- 
tent (Table 3 and Figure 3) and is predicted to en- 
code 3,571 genes, 98% of which are protein cod- 
ing. Nearly 70% of predicted proteins have a func- 
tional prediction and COG functional categories 
have been assigned to 53% of predicted proteins 
(Table 4). 
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Figure 3. Graphical circular map of the genome. From outside to the center: Genes on forward strand (color by COG 
categories), Genes on reverse strand (color by COG categories), GC content, GC skew. 
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Table 3. Genome Statistics 



Attribute 



Value 



% of Total 



Genome size (bp) 3,647,826 

DNA coding region (bp) 3,1 56,492 

DNA G+C content (bp) 2,484,899 

Number of replicons unknown 

Extrachromosomal elements unknown 

Total genes 3,571 

tRNA genes 46 

rRNA genes 3 

rRNA operons 1 

CRISPR repeats 0 

Protein-coding genes 3,522 

Pseudo genes (partial genes) 6 (233) 

Genes with function prediction 2,486 

Genes in paralog clusters 194 

Genes assigned to COGS 1 ,897 

Genes assigned Pfam domains 2,451 

Genes with signal peptides 412 

Genes with transmembrane helices 590 



100.00% 
86.35% 
68.12% 



100.0% 
1 .28% 
0.08% 
0.03% 

98.62% 
0.17% (6.52%) 
69.62% 
5.43% 
53.12% 
68.64% 
1 1 .54% 
16.52% 



Table 4. 


Number of genes associated with the general COG functional categories 


Code 


Value 


%age 


Description 


J 


121 


3.4 


Translation, ribosomal structure and biogenesis 


A 


1 


0.0 


RNA processing and modification 


K 


89 


2.5 


Transcription 


L 


84 


2.4 


Replication, recombination and repair 


B 


0 


0.0 


Chromatin structure and dynamics 


D 


18 


0.5 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0.0 


Nuclear structure 


V 


22 


0.6 


Defense mechanisms 


T 


51 


1.4 


Signal transduction mechanisms 


M 


85 


2.4 


Cell wall/membrane/envelope biogenesis 


N 


3 


0.1 


Cell motility 


Z 


0 


0.0 


Cytoskeleton 


W 


0 


0.0 


Extracellular structures 


U 


1 1 


0.3 


Intracellular trafficking and secretion, and vesicular transport 


o 


77 


2.2 


Posttranslational modification, protein turnover, chaperones 


c 


140 


4.0 


Energy production and conversion 


G 


100 


2.8 


Carbohydrate transport and metabolism 


E 


225 


6.4 


Amino acid transport and metabolism 


F 


74 


2.1 


Nucleotide transport and metabolism 


H 


102 


2.9 


Coenzyme transport and metabolism 


1 


120 


3.4 


Lipid transport and metabolism 


P 


110 


3.1 


Inorganic ion transport and metabolism 


Q 


92 


2.6 


Secondary metabolites biosynthesis, transport and catabolism 


R 


246 


7.0 


General function prediction only 


S 


126 


3.6 


Function unknown 




1625 


46.1 


Not in COGs 
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